293 research outputs found
Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks
Semantic labeling (or pixel-level land-cover classification) in ultra-high
resolution imagery (< 10cm) requires statistical models able to learn high
level concepts from spatial data, with large appearance variations.
Convolutional Neural Networks (CNNs) achieve this goal by learning
discriminatively a hierarchy of representations of increasing abstraction.
In this paper we present a CNN-based system relying on an
downsample-then-upsample architecture. Specifically, it first learns a rough
spatial map of high-level representations by means of convolutions and then
learns to upsample them back to the original resolution by deconvolutions. By
doing so, the CNN learns to densely label every pixel at the original
resolution of the image. This results in many advantages, including i)
state-of-the-art numerical accuracy, ii) improved geometric accuracy of
predictions and iii) high efficiency at inference time.
We test the proposed system on the Vaihingen and Potsdam sub-decimeter
resolution datasets, involving semantic labeling of aerial images of 9cm and
5cm resolution, respectively. These datasets are composed by many large and
fully annotated tiles allowing an unbiased evaluation of models making use of
spatial information. We do so by comparing two standard CNN architectures to
the proposed one: standard patch classification, prediction of local label
patches by employing only convolutions and full patch labeling by employing
deconvolutions. All the systems compare favorably or outperform a
state-of-the-art baseline relying on superpixels and powerful appearance
descriptors. The proposed full patch labeling CNN outperforms these models by a
large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
Kernel Manifold Alignment
We introduce a kernel method for manifold alignment (KEMA) and domain
adaptation that can match an arbitrary number of data sources without needing
corresponding pairs, just few labeled examples in all domains. KEMA has
interesting properties: 1) it generalizes other manifold alignment methods, 2)
it can align manifolds of very different complexities, performing a sort of
manifold unfolding plus alignment, 3) it can define a domain-specific metric to
cope with multimodal specificities, 4) it can align data spaces of different
dimensionality, 5) it is robust to strong nonlinear feature deformations, and
6) it is closed-form invertible which allows transfer across-domains and data
synthesis. We also present a reduced-rank version for computational efficiency
and discuss the generalization performance of KEMA under Rademacher principles
of stability. KEMA exhibits very good performance over competing methods in
synthetic examples, visual object recognition and recognition of facial
expressions tasks
Non-convex regularization in remote sensing
In this paper, we study the effect of different regularizers and their
implications in high dimensional image classification and sparse linear
unmixing. Although kernelization or sparse methods are globally accepted
solutions for processing data in high dimensions, we present here a study on
the impact of the form of regularization used and its parametrization. We
consider regularization via traditional squared (2) and sparsity-promoting (1)
norms, as well as more unconventional nonconvex regularizers (p and Log Sum
Penalty). We compare their properties and advantages on several classification
and linear unmixing tasks and provide advices on the choice of the best
regularizer for the problem at hand. Finally, we also provide a fully
functional toolbox for the community.Comment: 11 pages, 11 figure
Detecting animals in African Savanna with UAVs and the crowds
Unmanned aerial vehicles (UAVs) offer new opportunities for wildlife
monitoring, with several advantages over traditional field-based methods. They
have readily been used to count birds, marine mammals and large herbivores in
different environments, tasks which are routinely performed through manual
counting in large collections of images. In this paper, we propose a
semi-automatic system able to detect large mammals in semi-arid Savanna. It
relies on an animal-detection system based on machine learning, trained with
crowd-sourced annotations provided by volunteers who manually interpreted
sub-decimeter resolution color images. The system achieves a high recall rate
and a human operator can then eliminate false detections with limited effort.
Our system provides good perspectives for the development of data-driven
management practices in wildlife conservation. It shows that the detection of
large mammals in semi-arid Savanna can be approached by processing data
provided by standard RGB cameras mounted on affordable fixed wings UAVs
Optimal Transport for Domain Adaptation
Domain adaptation from one data space (or domain) to another is one of the
most challenging tasks of modern data analytics. If the adaptation is done
correctly, models built on a specific data space become more robust when
confronted to data depicting the same semantic concepts (the classes), but
observed by another observation system with its own specificities. Among the
many strategies proposed to adapt a domain to another, finding a common
representation has shown excellent properties: by finding a common
representation for both domains, a single classifier can be effective in both
and use labelled samples from the source domain to predict the unlabelled
samples of the target domain. In this paper, we propose a regularized
unsupervised optimal transportation model to perform the alignment of the
representations in the source and target domains. We learn a transportation
plan matching both PDFs, which constrains labelled samples in the source domain
to remain close during transport. This way, we exploit at the same time the few
labeled information in the source and the unlabelled distributions observed in
both domains. Experiments in toy and challenging real visual adaptation
examples show the interest of the method, that consistently outperforms state
of the art approaches
Land cover mapping at very high resolution with rotation equivariant CNNs: towards small yet accurate models
In remote sensing images, the absolute orientation of objects is arbitrary.
Depending on an object's orientation and on a sensor's flight path, objects of
the same semantic class can be observed in different orientations in the same
image. Equivariance to rotation, in this context understood as responding with
a rotated semantic label map when subject to a rotation of the input image, is
therefore a very desirable feature, in particular for high capacity models,
such as Convolutional Neural Networks (CNNs). If rotation equivariance is
encoded in the network, the model is confronted with a simpler task and does
not need to learn specific (and redundant) weights to address rotated versions
of the same object class. In this work we propose a CNN architecture called
Rotation Equivariant Vector Field Network (RotEqNet) to encode rotation
equivariance in the network itself. By using rotating convolutions as building
blocks and passing only the the values corresponding to the maximally
activating orientation throughout the network in the form of orientation
encoding vector fields, RotEqNet treats rotated versions of the same object
with the same filter bank and therefore achieves state-of-the-art performances
even when using very small architectures trained from scratch. We test RotEqNet
in two challenging sub-decimeter resolution semantic labeling problems, and
show that we can perform better than a standard CNN while requiring one order
of magnitude less parameters
Advances in Hyperspectral Image Classification: Earth monitoring with statistical learning methods
Hyperspectral images show similar statistical properties to natural grayscale
or color photographic images. However, the classification of hyperspectral
images is more challenging because of the very high dimensionality of the
pixels and the small number of labeled examples typically available for
learning. These peculiarities lead to particular signal processing problems,
mainly characterized by indetermination and complex manifolds. The framework of
statistical learning has gained popularity in the last decade. New methods have
been presented to account for the spatial homogeneity of images, to include
user's interaction via active learning, to take advantage of the manifold
structure with semisupervised learning, to extract and encode invariances, or
to adapt classifiers and image representations to unseen yet similar scenes.
This tutuorial reviews the main advances for hyperspectral remote sensing image
classification through illustrative examples.Comment: IEEE Signal Processing Magazine, 201
- …